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Abstract - Network topology and its relationship to tie strengths may hinder or enhance the 
spreading of information in social networks. We study the correlations between tie strengths 
and topology in networks of scientific collaboration, and show that these are very different from 
ordinary social networks. For the latter, it has earlier been shown that strong ties are associated 
with dense network neighborhoods, while weaker ties act as bridges between these. Because of 
this, weak links act as bottlenecks for the difi'usion of information. We show that on the contrary, 
in co-authorship networks dense local neighborhoods mainly consist of weak links, whereas strong 
links are more important for overall connectivity. The important role of strong links is further 
highlighted in simulations of information spreading, where their topological position is seen to 
speed up spreading dynamics. Thus, in contrast to ordinary social networks, weight-topology 
correlations enhance the flow of information across scientific collaboration networks. 



Introduction. — One of the key insights of network 
theory is that the structure of networks reflects their func- 
tion and it also sets constraints on dynamical processes 
taking place on networks [T] . Such structure may be a di- 
rect consequence of evolutionary forces acting on the entire 
system [2][3] , such as for modules performing specific tasks 
in networks of metabolism or genetic regulation [3] . Alter- 
natively, the structure may arise in an emergent fashion 
from the actions of the individual nodes of the network. 
This is the case for social networks, where individuals at- 
tempt to satisfy their basic social needs related to emo- 
tional support, social cohesion, and access to resources 
and information, while under spatial, time and cognitive 
constraints [5H9|. In addition, the evolution of networks 
of social interaction may be influenced by external driv- 
ing forces; this is especially true for professional networks 
such as the networks of scientific collaboration considered 
in this Letter. 

Social networks are in general characterized by the ex- 
istence of dense, cohesive social groups that arise out 
of the above-mentioned individual-level mechanisms and 
constraints. A prominent mechanism giving rise to dense 
social groups is triadic closure [TOIIII] - learning to know 
people through the people we know. Simultaneously, the 
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interplay of several factors, such as homophily, where in- 
dividuals of similar characteristics prefer to form ties |10| , 
the need for emotional support and social cohesion, and 
the high maintenance costs of strong tics give rise to corre- 
lations between tie strengths and group structure. The ex- 
istence of such correlations was hypothesized by Granovet- 
ter [5] already in the 1970's: strong ties are associated 
with dense network neighborhoods, whereas weak links 
act as bridges between these. This weak-link hypothesis 
has since been confirmed with the help of electronic com- 
munication records |12fll4j . This particular relationship 
between tie strengths and network structure has several 
important consequences: first, for the connectivity of the 
entire network, weak links play a crucial role |15] . Second, 
because of this, they also act as bottlenecks for diffusion 
and spreading of information on the network. When com- 
pared with a null model where tie strengths are replaced 
by the network average, simulated spreading of informa- 
tion is slower [12] . 

However, in networks of professional collaboration, such 
bottlenecks for information diffusion would act against 
the purposes of individuals in the network. Whereas net- 
works of scientific collaboration display many characteris- 
tic features of ordinary social networks, such as promi- 
nent community structure (see, e.g., HMH]), they are 
also shaped by different driving mechanisms. First, one 
can argue that the structure of the underlying space of 
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ideas and scientific knowledge are reflected in the network 
structure [5D]. Second, in addition to the need for cohe- 
sive sharing and processing of information in small groups, 
there is a particularly strong need for avoiding scientific 
isolation by efficient transmission and brokerage of infor- 
mation in the network [TWE] . These needs are likely to 
manifest in the network structure. In this Letter, mo- 
tivated by the above considerations and observations of 
anomalous weight-topology correlations in collaboration 
networks |22| . we show that unlike for "everyday" social 
networks, the correlations between tie strengths and net- 
work topology enhance the spreading of information in 
networks of scientific collaboration. 

This paper is structured as follows: first, we describe 
the source data and characteristics of the co-authorship 
networks. Then, wc address correlations between link 
strengths and the surrounding network density, and show 
that in scientific collaboration networks, dense network 
surroundings are associated with weak instead of strong 
links. We further corroborate this result by studying 
cliques, and show with percolation analysis that strong 
links are more important to overall connectivity. We then 
study the relationship of tie strengths to community struc- 
ture at several levels of coarse-graining. Finally, using 
simulated spreading of information, we show that weight- 
topology correlations give rise to fast spreading dynamics. 

Data Sets. — We consider two datasets: the first con- 
tains all articles published in the arXiv [23] till March 2010 
(595,276 papers), and the second all articles published 
in Physical Review (PR) journals [U between 1893-2009 
(463,357 papers). From these data wc extract the list of 
authors, identified by their surname and first two initials. 
As our focus is on ties that have social aspects, we ig- 
nore articles with > 10 authors (~2% of all articles in 
each set) to filter out the huge collaborations in e.g. hep- 
ex and astro-ph, where the number of authors can reach 
~ 1,000 and thus all authors are not likely to know each 
other. We collapse the bipartite author-paper networks 
to co-authorship networks by connecting scientists who 
have co-authored one or more articles. We then extract 
the largest connected components (arXiv: =181,979 
nodes and L =995,637 hnks, PR: N =203,245 nodes and 
L =1,198,002 links). These amount to 88.5% and 94.4% of 
the total numbers of authors in the datasets, respectively. 

For the tie strengths, i.e. link weights, of the unipar- 
tite projections we use the formula introduced by New- 
man [25] : Wij = jT-i where p is the set of papers 
where authors i and j collaborate and Up is the number 
of co-authors of paper p. Single-author papers are ex- 
cluded. The motivation behind this commonly used for- 
mula is that an author divides his/her time between the 
Up — 1 other authors, and thus the strength of the con- 
nection should vary inversely with Up — 1. It should be 
noted that this definition of tie strength is not the only 
possible choice; however, it is in our view reasonable to 
assume that joint work on a paper with a large number of 




Fig. 1: Link and author statistics for the arXiv and PR net- 
works, a) The cumulative link weight distributions, b) The 
cumulative distribution of the publication ages of authors (in 
days), c) The average link weight as a function of the ge- 
ometric mean of endpoint author ages for the APS network 
(circles). The colors denote the corresponding vertically nor- 
malized probability distributions, d) The average link age in 
days as a function of the hnk weight for the APS network. The 
plots corresponding to panels c) and d) are qualitatively similar 
for the arXiv network. 



authors contributes less than, say, a two-author papei0. 
Results. 

Basic characteristics. For both sets of data, the over- 
all network properties are in accordance with earlier ob- 
servations [^5H?7] : the distributions of degree k (number 
of links of a node) and strength s (sum of link weights of 
a node) are heavy-tailed. Further, the strength approxi- 
mately depends on degree as (s) cx fc(u'), where (w) is the 
average link weight. The weight distribution is also broad 
(Fig. [T]a). As high link weights are typically accumulated 
over time between senior scientists, we define the publi- 
cation age a.i of scientist i as the time elapsed between 
his/her first and last publications in our records. Fig. [T] 
b) displays the cumulative distribution of such publica- 
tion ages. Its shape confirms that most of the scientists in 
the data can be considered junior, refiecting the hierarchy 
of the scientific profession where the number of professors 
and other senior scientists is significantly smaller than that 
of junior scientists. Fig. [T]c) displays average link weight 



^For assessing the robustness of our results, we have carried out 
similar analysis with an alternative weighting scheme where Wij = 



E, 



With , 



1 we recover the original scheme, and with 



/3 = weights are insensitive to the number of authors of a paper. 
For /3 = 0.25 and /3 = 0.5, all our results hold, while for /J = 
resolution is lost for percolation and spreading analysis, as 67% of 
the links have unit weight; nevertheless, the rest of the results are 
qualitatively similar to the ones presented here. 
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Fig. 2: The dependence of link overlap on link weight for (a) the 
arXiv and (b) the PR co-authorship networks. Circles indicate 
averages, and colors denote vertically normalized probability 
distributions. 



as a function of the geometric mean of the publication ages 
of the endpoint authors; as expected, the link weights be- 
tween senior scientists are on average higher . Similarly 
to the publication age of scientists, we define the age of a 
co-authorship link a.y as the difference between the dates 
of the last and first joint publications of the two authors 
i and j. As expected, this quantity increases on average 
with the weight of the link (Fig. [T]d). 

Dependence of neighborhood overlap on link weight. 
We begin our exploration of the weight-topology corre- 
lations by considering the neighborhood overlap of links. 
The overlap Oij measures the fraction neighbors common 
to the endpoint nodes of a link, and has earlier been ob- 
served to increase with link weight in a communication 
network |12| , in accordance with the Granovetter hypoth- 
esis [B] . The overlap of a link is defined as 
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where riy is the number of neighbors common to the end- 
point nodes i and j and fc, and kj are their respective 
degrees. In contrast to earlier results, we find that the 
overlap decreases with link weight for both co-authorship 
networks (Fig. ^ for the vast majority of links. This 
decrease is followed by an increase for the very highest- 
weight links in the tail of the weight distribution. Their 
number is very small: for the arXiv network, the section 
of the curve where Wij > 2 only corresponds to ^ 5.3% 
of the links, and for the PR network, to ^ 3.3 % of the 
links. Hence, in stark contrast to ordinary social networks, 
the weak links mainly reside inside dense network neigh- 
borhoods, whereas strong links act as connectors between 
these. Such weight-topology correlations reflect the hier- 
archy of the scientific profession. As seen in Fig. [l]c) and 
d) , weak links can mainly be attributed to research groups 
that include junior scientists, whereas strong links connect 
senior scientists of different groups. Further, the strongest 
links with high overlap belong to dense neighborhoods, in- 
dicating long-term collaborations between senior scientists 
of the same research group. 




Fig. 3: a) to c) The probability distribution of the intensity of 
cliques of orders fc = 3, 4, 5 in the arXiv collaboration network 
and in the reference ensemble. The reference ensemble is gen- 
erated by shuffling the weights of the empirical network while 
keeping its topology fixed; the reference distribution is an aver- 
age over 100 realizations. The corresponding curves in the PR 
network are qualitatively similar, d) The average publication 
age of the links of cliques as a function of their intensity. 



Clique intensity distribution. The above results indi- 
cate that weak links are in general associated with dense 
network neighborhoods. For further evidence, we have 
investigated subgraph weights by applying the concept of 
clique intensity p^ll28] , designed for studying the coupling 
between the link weights and networks structure. The in- 
tensity of a subgraph g with nodes Vg and links Ig is given 



(1) by the geometric mean of its weights as 



1(9) 
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g I is the number of links in g. Similarly to Ref. |13| , 



where \l. 

we detect all fc-cliques, that is, fully connected subgraphs 
of k nodes in the network, and calculate the distribution 
of their intensities. As expected, the number of cliques 
of any order is much larger than compared to a random 
configuration model with the same degree sequence. As a 
reference, we also calculate clique intensities in an ensem- 
ble where the weights of the original network are randomly 
reshuffled, i.e. exchanged between its links, while the orig- 
inal topology and the number of cliques is retained. Note 
that as the collaboration networks are projections of bipar- 
tite networks, there is an abundance of cliques of various 
sizes. The intensity distributions of fc-cliques for the orig- 
inal network and for the reference ensemble arc displayed 
in Fig. [3] for the arXiv network, with k = 3,4,5. First, 
we observe that in the original network, the distribution 
of clique intensities is broad. There is a very high number 
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arXiv 


PR 


k 


^1/2 




-'1/2 


Tli. 

^1/2 


3 


0.330 


0.347 (10-^0 


0.297 


0.312 (10- 




4 


0.323 


0.354 (10-5) 


0.303 


0.316 (10- 


-4) 


5 


0.326 


0.357 (10-4) 


0.327 


0.318 (10- 




6 


0.331 


0.358 (10-4) 


0.401 


0.320 (10- 





Table 1: Median intensity I1/2 of of cliques of order fc, in the 
original arXiv and the PR collaboration networks (O) and the 
weight-shuffled reference ensembles (R). The order of magni- 
tude of the standard deviation of the median across 100 realiza- 
tions of the weight-shuffled reference ensemble is also indicated. 



of low-intensity cliques, corroborating the overlap results. 
This is in contrast with the results reported for the com- 
munication network in Ref. |13j . where the intensities are 
centered around a well-defined mean. Overall, the me- 
dian clique intensities in the reference ensemble arc larger 
than in the original networks (see Table [T]). The abun- 
dance of low-intensity cliques is further highlighted when 
compared to the intensity distribution of the reference en- 
semble. The broad distribution of the original network 
also indicates that there is a small number of cliques with 
very high intensities: as indicated in panel d) that shows 
the average publication age of the links in cliques as a 
function of their intensity, such rare high-intensity cliques 
correspond to strong collaborations between senior scien- 
tists. 

Percolation analysis. In order to understand the role 
of strong and weak links in the global connectivity of the 
network, we next address link percolation in the collabo- 
ration networks. Similarly to Ref. jl2j . we first remove the 
links of the network in decreasing and increasing order of 
weight, and keep track of the relative size of the largest 
connected component of nodes Smax/-^ as a function of 
the fraction of removed links /. The results are displayed 
in Fig. 0^) for the arXiv and Fig. for the PR network. 
Both networks arc remarkably robust to link removal as 
the giant component only disappears when almost all links 
have been removed, reflecting the broad degree distribu- 
tions. This is true for both orders of link removal. How- 
ever, it is clear that the giant component shrinks much 
faster when the strongest links are removed first, indicat- 
ing their important role for the overall connectivity of the 
network. Again, this behavior is opposite to earlier obser- 
vations [T2j[U , where the removal of weak links disrupts 
the connectivity faster. We have also performed a sim- 
ilar analysis for overlap; it is seen that when removing 
low overlap links first, the network fragments faster, and 
the giant component disappears earlier than for weight re- 
moval (Figs.|4]b) and d)). This behavior can be attributed 
to modular structure, where the low-overlap links connect 
dense regions of high-overlap links. 

Modularity analysis. To conclude our study of weight- 
topology correlations, we address the mesoscopic structure 
of the co-authorship networks at different levels of organi- 




Fig. 4: Link percolation in the (a,b) arXiv and (c,d) PR co- 
authorship networks. In both plots, the horizontal axis repre- 
sents the fraction of removed links / and the vertical axis the 
relative size of the giant connected component Smax/N. Pan- 
els a) and c) show the dependence of the giant component size 
on / when links are removed in the order of increasing (solid 
lines) and decreasing (dashed lines) weight. For both networks, 
removing the strongest links first shrinks the size of the giant 
component fastest. In panels b) and d), links are for reference 
removed in the order of increasing (solid lines) or decreasing 
(dashed lines) overlap; as expected, removing low-overlap links 
first breaks the giant component faster. 

zation with community detection. The detection is based 
on the structure of the networks alone, i.e. unweighted 
links are used for detecting the communities, and the re- 
lationship of link weights to the detected communities is 
then studied. In order to detect communities at different 
levels of coarse-graining, we used the parametric general- 
ization of modularity Q introduced in Ref. [Tni[53] as 

where Aij = 1 if i and j are connected and otherwise, 7 
is the resolution parameter, = kikj/2L represents the 
null model, and da^c = 1 if the community assignments Ci 
and Cj of the two nodes are the same. An optimal partition 
corresponding to each value of 7 is obtained from maxi- 
mizing the value of Q-y. The resolution parameter 7 allows 
for tuning the characteristic size of the modules. At small 
values of 7, large communities will be detected. When 7 
is increased, the optimization of Qj leads to smaller and 
smaller communities in the optimal partition. We use the 
Louvain method [30j to determine the optimal partition 
corresponding to the maximum Qj. 

Fig. [5] displays the number of communities, their 
size, and the average weight of their internal links rel- 
ative to that in the weight-averaged reference ensemble. 
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Fig. 5: Community structure in the arXiv networlc, at difTerent 
levels of resolution. For all panels, the horizontal axis corre- 
sponds to the value of the resolution parameter 7, such that 
the resolution runs from coarse-grained to detailed, a) The 
number of detected communities, b) The average size of de- 
tected communities (circles) and the corresponding vertically 
normalized size distribution (colors), c) The average weight 
of community- internal links (win), normalized by the average 
weight of the same links in the weight-shufHed reference en- 
semble {wf^^'^). The communities were detected purely on the 
basis of topology, i.e. link weights were not taken into account. 
Results for the PR network are qualitatively similar. 

(^^in)/(it'in"'^)i for different values of 7. For the small- 
est values of 7, the entire network is a single community. 
When 7 is increased, the method begins to pick up com- 
munities of fairly large size. For moderately large com- 
munities whose average sizes range from ~ 10 to ~ 10^, 
it is seen that their internal link weights are higher than 
randomly expected. This makes sense, as the sizes of the 
largest communities are of the order of fields or sub-fields 
of science. As 7 is further increased and the average com- 
munity sizes drop below 10, so that they are roughly in 
the range of research groups, intra-community links have 
on average lower weights than randomly expected, in line 
with our observations on the behavior of the link overlap 
and the analysis of cliques. Overall, these results indicate 
that communities at different scales may display different 
weight-topology correlations. 

Simulated .spreading of information. To conclude our 
investigations, we address the spreading of information 
in the co-authorship networks, focusing on the role of 
structural correlations and the relationship between tie 
strengths and topologjH Earlier, it has been shown with 

■^It is worth stressing that wc only address the spreading of infor- 
mation through weighted collaboration networks, and use spreading 
dynamics as a probe of network structure. In reality, the flow of 
information is of course not constrained to such idealized networks. 
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Fig. 6: SI spreading on the unweighted (a) arXiv and the (b) 
PR networks at p=0.01 The spreading curves are compared 
with a randomized network, with same degree distribution. SI 
spreading on the weighted (c) arXiv and (d) PR networks. The 
spreading curves are compared with a randomized network, 
which has same network structure but the weights on each of 
the edge is shuffled. 

simulations that in social networks, the prominent commu- 
nity structure, the effect of weak-link bottlenecks, and the 
time-domain features of communication slow down spread- 
ing compared to randomized reference systems [1^151] . 
However, for the co-authorship networks, the weak-link 
bottlenecks appear to be absent. To study the effects of 
network structure and weight-topology correlations on the 
spreading of information in the co-authorship networks, 
we simulate spreading with the simple SI (Susceptible- 
Infectious) model. In this model, individuals are initially 
in the susceptible state (S), with the exception of a seed 
individual whose state is set to infectious (I) . The informa- 
tion then spreads through the links of the network, such 
that at every time step, each susceptible individual who 
is connected to an infectious individual becomes infected 
with some probability that may depend on the prop- 
erties of the link connecting the two nodes. 

Let us first study the effect of the network topology 
by disregarding weights and setting Pij = p for all links. 
As a reference, we construct networks where the degree 
sequence of the original networks is retained but links 
are otherwise randomly rewired (the configuration model). 
This procedure destroys structural correlations such as 
community structure. We then run the spreading simu- 
lation on the original and reference networks by selecting 
random seed nodes, and observing the fraction of indi- 
viduals infected with the information Pinf as a function 
of time. Figs. [5] a) and b) show the resulting spreading 
dynamics on the arXiv and PR networks and the cor- 
responding reference ensembles, averaged over 10'^ runs, 
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with p = 0.01. In both cases, it is seen that spreading 
is shghtly slower in the original networks. This can be 
attributed to community structure: the low numbers of 
links between communities slow down spreading. 

However, when weights are introduced into the model, 
the situation is reversed. For spreading on weighted net- 
works, we set Pij — p X Wij, i.e. the transmission rate 
between two nodes is proportional to the link weight Wij . 
Here, the parameter p now controls the overall spreading 
rate. We set p = 1/ max(u;y ), and so for the globally 
strongest link we have Pij = 1 and for others Pij < 1. To 
investigate the effect of weight-topology correlations, we 
apply the same reference model as earlier, and randomly 
reshuffle link weights while keeping the network topology 
intact. We then simulate spreading as above. Figs. [S] c) 
and d) show that for both networks, the difference between 
original and reference networks is the spreading is much 
faster in the original networks compared to the reference, 
in contrary to the unweighted case. This effect could also 
reflect the existence of a core of high-productivity scien- 
tists with strong ties, as observed in Ref. [35] ■ Hence, for 
the spreading of information, strong links and their posi- 
tion in the network are crucial in co-authorship networks. 

Conclusions and Discussion. In conclusion, we 
have found that in networks of scientific collaboration, the 
relationship between tie strengths and network topology 
is different from ordinary social networks. This can be at- 
tributed to different driving mechanisms of tie formation 
and reinforcement - the strength of ties reflects the hierar- 
chy of the scientific profession as well as the needs of indi- 
viduals, such as the need for efficient access to new infor- 
mation. Using neighborhood overlap and clique intensity 
analysis, we have shown that locally dense network neigh- 
borhoods are associated with weak links, whereas stronger 
links are of importance to the overall connectivity of the 
networks. However, at a more coarse-grained resolution, 
links within large communities of the size of fields or sub- 
fields of science are on average stronger than randomly 
expected. In future work, it would be interesting to ex- 
plore these features deeper using information such as au- 
thor departments and affiliations. This observation is also 
of importance for the design of community detection meth- 
ods [33] - typically, weighted community detection meth- 
ods assume that links within dense topological clusters 
are stronger than average, and our results indicate that 
this assumption is not necessarily valid. With the help 
of simulations, we have also shown that the topological 
position of strong ties in collaboration networks increases 
the speed of spreading dynamics. Thus, weight-topology 
correlations mitigate the isolating effects of small, cohe- 
sive groups and enhance the flow of information across 
the network. 
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